New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fix sort order aware file group parallelization #8517

Merged

alamb merged 3 commits into apache:main from alamb:alamb/bad_redistribution

Dec 17, 2023

Contributor

alamb commented Dec 12, 2023 •

edited

Loading

~~Draft as it builds on #8505~~

Which issue does this PR close?

Closes #8451

Rationale for this change

Repatitioning data for pre-sorted listing tables can sometimes result in incorrect results. See descriptions on #8451 and in the comments in this PR for details

What changes are included in this PR?

Move the code / tests for redistributing files amongst groups to its own module
Add code + tests to handle redistributing files and preserving sort order

Are these changes tested?

Yes, new unit tests and end to end coverage (updates to #8505)

Are there any user-facing changes?

Correct answers with pre-sorted data

github-actions bot added core sqllogictest labels

alamb force-pushed the alamb/bad_redistribution branch from f2ec70c to d7026eb Compare

December 12, 2023 17:05

alamb commented

View reviewed changes

datafusion/core/src/physical_optimizer/enforce_distribution.rs

-                          // ordering is lost here
-                          "RepartitionExec: partitioning=RoundRobinBatch(10), input_partitions=2",
-                          "ParquetExec: file_groups={2 groups: [[x], [y]]}, projection=[a, b, c, d, e], output_ordering=[a@0 ASC]",
+                          "ParquetExec: file_groups={10 groups: [[x:0..20], [y:0..20], [x:20..40], [y:20..40], [x:40..60], [y:40..60], [x:60..80], [y:60..80], [x:80..100], [y:80..100]]}, projection=[a, b, c, d, e], output_ordering=[a@0 ASC]",

Contributor Author

alamb Dec 12, 2023

The file is now divided into groups that preserve its order and thus no resort is needed

datafusion/sqllogictest/test_files/repartition_scan.slt

    
            @@ -118,7 +118,7 @@ physical_plan
          
              SortPreservingMergeExec: [column1@0 ASC NULLS LAST]

              --CoalesceBatchesExec: target_batch_size=8192

              ----FilterExec: column1@0 != 42

              ------ParquetExec: file_groups={4 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/1.parquet:0..200], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/1.parquet:200..394, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:0..6], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:6..206], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:206..403]]}, projection=[column1], predicate=column1@0 != 42, pruning_predicate=column1_min@0 != 42 OR 42 != column1_max@1

              ------ParquetExec: file_groups={4 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/1.parquet:0..197], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:0..201], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/2.parquet:201..403], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/repartition_scan/parquet_table/1.parquet:197..394]]}, projection=[column1], output_ordering=[column1@0 ASC NULLS LAST], predicate=column1@0 != 42, pruning_predicate=column1_min@0 != 42 OR 42 != column1_max@1

Contributor Author

alamb Dec 12, 2023

This shows the test added in #8505 is now fixed (the two files are not intermixed)

datafusion/core/src/datasource/physical_plan/file_groups.rs

+              use std::collections::BinaryHeap;
+              use std::iter::repeat_with;
+              /// Repartition input files into `target_partitions` partitions, if total file size exceed

Contributor Author

alamb Dec 12, 2023

I added a bunch of comments trying to clarify what this code was supposed to be doing

datafusion/core/src/datasource/physical_plan/file_groups.rs

+              ///                                      divides into 4 groups
+              /// ```
+              #[derive(Debug, Clone, Copy)]
+              pub struct FileGroupPartitioner {

Contributor Author

alamb Dec 12, 2023

The API is new, but the file distribution algorithm is the same for unordered inputs

datafusion/core/src/datasource/physical_plan/file_groups.rs

+                      &self,
+                      file_groups: &[Vec<PartitionedFile>],
+                  ) -> Option<Vec<Vec<PartitionedFile>>> {
+                      let target_partitions = self.target_partitions;

Contributor Author

alamb Dec 12, 2023

This is the old algorithm, unmodified

datafusion/core/src/datasource/physical_plan/file_groups.rs

+              mod test {
+                  use super::*;
+                  /// Empty file won't get partitioned

Contributor Author

alamb Dec 12, 2023

The first set of tests are the original tests, though I refactored them so they didn't rely on ParquetExec

datafusion/core/src/datasource/physical_plan/file_groups.rs

+                  }
+                  #[test]
+                  fn repartition_ordered_no_action_too_few_partitions() {

Contributor Author

alamb Dec 12, 2023

New tests start here

datafusion/core/src/datasource/physical_plan/file_scan_config.rs

                   pub fn repartition_file_groups(
                       file_groups: Vec<Vec<PartitionedFile>>,
                       target_partitions: usize,
                       repartition_file_min_size: usize,
                   ) -> Option<Vec<Vec<PartitionedFile>>> {
-                      let flattened_files = file_groups.iter().flatten().collect::<Vec<_>>();

Contributor Author

alamb Dec 12, 2023

refactored into datafusion/core/src/datasource/physical_plan/file_groups.rs

datafusion/core/src/datasource/physical_plan/mod.rs

@@ @@ -809,345 +810,4 @@ mod tests { @@
                           extensions: None,
                       }
                   }
-                  /// Unit tests for `repartition_file_groups()`
-                  #[cfg(feature = "parquet")]

Contributor Author

alamb Dec 12, 2023

These tests were moved and refactored into datafusion/core/src/datasource/physical_plan/file_groups.rs

datafusion/core/src/physical_optimizer/enforce_distribution.rs

@@ @@ -3862,6 +3863,56 @@ pub(crate) mod tests { @@
                       Ok(())
                   }
+                  #[test]
+                  fn parallelization_multiple_files() -> Result<()> {

Contributor Author

alamb Dec 12, 2023

This test fails on main

alamb mentioned this pull request

Incorrect results due to repartitioning a sorted ParquetExec #8451

Closed

alamb changed the title ~~Fix sort order aware redistribution~~ Fix sort order aware file group paralleization

alamb added 2 commits

December 13, 2023 16:09


          Minor: Extract file group repartitioning and tests into `FileGroupRep…

0a9dfb9

…artitioner`


          Implement sort order aware redistribution

0b558bc

alamb force-pushed the alamb/bad_redistribution branch from d7026eb to 0b558bc Compare

December 13, 2023 21:09

alamb marked this pull request as ready for review

December 13, 2023 21:09

alamb changed the title ~~Fix sort order aware file group paralleization~~ Fix sort order aware file group parallelization

Dandandan approved these changes

View reviewed changes


          Merge remote-tracking branch 'apache/main' into alamb/bad_redistribution

ba76c95

Contributor Author

alamb commented Dec 17, 2023

Thank you for the review @Dandandan

alamb merged commit 2e16c75 into apache:main

22 checks passed

matthewgapp mentioned this pull request

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core sqllogictest